114 research outputs found

    Adding semantic modules to improve goal-oriented analysis of data warehouses using I-star

    Get PDF
    The success rate of data warehouse (DW) development is improved by performing a requirements elicitation stage in which the users’ needs are modeled. Currently, among the different proposals for modeling requirements, there is a special focus on goal-oriented models, and in particular on the i* framework. In order to adapt this framework for DW development, we previously developed a UML profile for DWs. However, as the general i* framework, the proposal lacks modularity. This has a specially negative impact for DW development, since DW requirement models tend to include a huge number of elements with crossed relationships between them. In turn, the readability of the models is decreased, harming their utility and increasing the error rate and development time. In this paper, we propose an extension of our i* profile for DWs considering the modularization of goals. We provide a set of guidelines in order to correctly apply our proposal. Furthermore, we have performed an experiment in order to assess the validity our proposal. The benefits of our proposal are an increase in the modularity and scalability of the models which, in turn, increases the error correction capability, and makes complex models easier to understand by DW developers and non expert users.This work has been partially supported by the ProS-Req (TIN2010-19130-C02-01) and by the MESOLAP (TIN2010-14860) and SERENIDAD (PEII-11-0327-7035) projects from the Spanish Ministry of Education and the Junta de Comunidades de Castilla La Mancha respectively. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

    An Approach to Automatically Detect and Visualize Bias in Data Analytics

    Get PDF
    Data Analytics and Artificial Intelligence (AI) are increasingly driving key business decisions and business processes. Any flaws in the interpretation of analytic results or AI outputs can lead to significant economic loses and reputation damage. Among existing flaws, one of the most often overlooked is the use biased data and imbalanced datasets. When unadverted, data bias warps the meaning of data and has a devastating effect on AI results. Existing approaches deal with data bias by constraining the data model, altering its composition until the data is no longer biased. Unfortunately, studies have shown that crucial information about the nature of data may be lost during this process. Therefore, in this paper we propose an alternative process, one that detects data biases and presents biased data in a visual way so that the user can comprehend how data is structured and decide whether or not constraining approaches are applicable in his context. Our approach detects the existence of biases in datasets through our proposed algorithm and generates a series of visualizations in a way that is understandable for users, including non-expert ones. In this way, users become aware not only of the existence of biases in the data, but also how they may impact their analytics and AI algorithms, thus avoiding undesired results.This work has been co-funded by the ECLIPSE-UA (RTI2018-094283-B-C32) project funded by Spanish Ministry of Science, Innovation, and Universities. Ana Lavalle holds an Industrial PhD Grant (I-PI 03-18) co-funded by the University of Alicante and the Lucentia Lab Spin-off Company

    Publishing a Scorecard for Evaluating the Use of Open-Access Journals Using Linked Data Technologies

    Get PDF
    Open access journals collect, preserve and publish scientific information in digital form, but it is still difficult not only for users but also for digital libraries to evaluate the usage and impact of this kind of publications. This problem can be tackled by introducing Key Performance Indicators (KPIs), allowing us to objectively measure the performance of the journals related to the objectives pursued. In addition, Linked Data technologies constitute an opportunity to enrich the information provided by KPIs, connecting them to relevant datasets across the web. This paper describes a process to develop and publish a scorecard on the semantic web based on the ISO 2789:2013 standard using Linked Data technologies in such a way that it can be linked to related datasets. Furthermore, methodological guidelines are presented with activities. The proposed process was applied to the open journal system of a university, including the definition of the KPIs linked to the institutional strategies, the extraction, cleaning and loading of data from the data sources into a data mart, the transforming of data into RDF (Resource Description Framework), and the publication of data by means of a SPARQL endpoint using the OpenLink Virtuoso application. Additionally, the RDF data cube vocabulary has been used to publish the multidimensional data on the web. The visualization was made using CubeViz a faceted browser to present the KPIs in interactive charts.This work has been partially supported by the Prometeo Project by SENESCYT, Ecuadorian Government

    Data Model for Storage and Retrieval of Legislative Documents in Digital Libraries Using Linked Data

    Get PDF
    Many countries have provided online access to some types of legislative documents by subject, keywords or date. Nevertheless, the possibility of querying historical versions of the documents is usually an uncommon feature. The dispersion of laws and other legislative documents and their continuous changes make difficult the generation and querying of valid legislative information at a given date. Furthermore, the ripple effect of modifications such as updates, insertions or derogations affecting the entire body of a law or part of it is not always visible for the citizens who are looking for legislative information. Some issues related to change management of legislative documents can be identified: how to apply the history of changes to a version of a legislative document to obtain a new version, and what type of data model might be better to satisfy temporal queries, to store new versions of documents or to obtain them dynamically. The access to all versions of a document and its fragments is important in legislative queries to be sure which law was in force to apply when a case happened. Law documents are produced and stored in information systems with different data models to access and retrieve information about them in a large-scale manner, but most of them do not have law change management functions. Web standards, such as XML, XSLT and RDF, facilitate the separation between content, presentation and metadata, thus contributing to a better annotation and exploitation of information from these documents and their fragments to improve the historical queries and the version generation of legislative documents. This paper presents a proposal of a data model for storage and retrieval of different versions of legislative documents using Linked Data, a method of publishing structured interlinked data, for managing relations between legislative documents and its changes. Document structures, changes to legislation, metadata, requirements of historical queries are analyzed in this work. Furthermore, the proposed model facilitates historical querying of legislative documents and consolidation procedures, allowing update relationships between documents and fragments without changes on the original documents. The model has been tested with Ecuadorian laws, but it could be used for law systems of other countries because the model is independent of the legislative framework.This work has been partially supported by the Prometeo Project by SENESCYT, Ecuadorian Government

    Solar Energy Prediction Model Based on Artificial Neural Networks and Open Data

    Get PDF
    With climate change driving an increasingly stronger influence over governments and municipalities, sustainable development, and renewable energy are gaining traction across the globe. This is reflected within the EU 2030 agenda, that envisions a future where there is universal access to affordable, reliable and sustainable energy. One of the challenges to achieve this vision lies on the low reliability of certain renewable sources. While both particulars and public entities try to reach self-sufficiency through sustainable energy generation, it is unclear how much investment is needed to mitigate the unreliability introduced by natural factors such as varying wind speed and daylight across the year. In this sense, a tool that aids predicting the energy output of sustainable sources across the year for a particular location can aid greatly in making sustainable energy investments more efficient. In this paper, we make use of Open Data sources, Internet of Things (IoT) sensors and installations distributed across Europe to create such tool through the application of Artificial Neural Networks. We analyze how the different factors affect the prediction of energy production and how Open Data can be used to predict the expected output of sustainable sources. As a result, we facilitate users the necessary information to decide how much they wish to invest according to the desired energy output for their particular location. Compared to state-of-the-art proposals, our solution provides an abstraction layer focused on energy production, rather that radiation data, and can be trained and tailored for different locations using Open Data. Finally, our tests show that our proposal improves the accuracy of the forecasting, obtaining a lower mean squared error (MSE) of 0.040 compared to an MSE 0.055 from other proposals in the literature.This paper has been co-funded by the ECLIPSE-UA (RTI2018-094283-B-C32) project from the Spanish Ministry of Science, Innovation, and Universities; both Jose M. Barrera (I-PI 98/18) and Alejandro Reina (I-PI 13/20) hold an Industrial PhD Grants co-funded by the University of Alicante and the Lucentia Lab Spin-off Company

    An Iterative Methodology for Defining Big Data Analytics Architectures

    Get PDF
    Thanks to the advances achieved in the last decade, the lack of adequate technologies to deal with Big Data characteristics such as Data Volume is no longer an issue. Instead, recent studies highlight that one of the main Big Data issues is the lack of expertise to select adequate technologies and build the correct Big Data architecture for the problem at hand. In order to tackle this problem, we present our methodology for the generation of Big Data pipelines based on several requirements derived from Big Data features that are critical for the selection of the most appropriate tools and techniques. Thus, thanks to our approach we reduce the required know-how to select and build Big Data architectures by providing a step-by-step methodology that leads Big Data architects into creating their Big Data Pipelines for the case at hand. Our methodology has been tested in two use cases.This work has been funded by the ECLIPSE project (RTI2018-094283-B-C32) from the Spanish Ministry of Science, Innovation and Universities

    A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

    Get PDF
    In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.This work has been funded by the ECLIPSE project (RTI2018-094283-B-C32) from the Spanish Ministry of Science, Innovation and Universities

    Current state of Linked Data in digital libraries

    Get PDF
    The Semantic Web encourages institutions, including libraries, to collect, link and share their data across the Web in order to ease its processing by machines to get better queries and results. Linked Data technologies enable us to connect related data on the Web using the principles outlined by Tim Berners-Lee in 2006. Digital libraries have great potential to exchange and disseminate data linked to external resources using Linked Data. In this paper, a study about the current uses of Linked Data in digital libraries, including the most important implementations around the world, is presented. The study focuses on selected vocabularies and ontologies, benefits and problems encountered in implementing Linked Data in digital libraries. In addition, it also identifies and discusses specific challenges that digital libraries face, offering suggestions for ways in which libraries can contribute to the Semantic Web. The study uses an adapted methodology for literature review, to find data available to answer research questions. It is based on the information found in the library websites recommended by W3C Library Linked Data Incubator Group in 2011, and scientific publications from Google Scholar, Scopus, ACM and Springer from the last 5 years. The selected libraries for the study are the National Library of France, the Europeana Library, the Library of Congress of the USA, the British Library and the National Library of Spain. In this paper, we outline the best practices found in each experience and identify gaps and future trends.This work was supported by the Prometeo Project from the Secretary of Higher Education, Science, Technology and Innovation (SENESCYT) of the Ecuadorian Government and by the project GEODAS-BI (TIN2012-37493-C03-03) supported by the Ministry of Economy and Competitiveness of Spain (MINECO). Alejandro Mate´ was funded by the Generalitat Valenciana (APOSTD/2014/064)

    An extension of iStar for Machine Learning requirements by following the PRISE methodology

    Get PDF
    The rise of Artificial Intelligence (AI) and Deep Learning has led to Machine Learning (ML) becoming a common practice in academia and enterprise. However, a successful ML project requires deep domain knowledge as well as expertise in a plethora of algorithms and data processing techniques. This leads to a stronger dependency and need for communication between developers and stakeholders where numerous requirements come into play. More specifically, in addition to functional requirements such as the output of the model (e.g. classification, clustering or regression), ML projects need to pay special attention to a number of non-functional and quality aspects particular to ML. These include explainability, noise robustness or equity among others. Failure to identify and consider these aspects will lead to inadequate algorithm selection and the failure of the project. In this sense, capturing ML requirements becomes critical. Unfortunately, there is currently an absence of ML requirements modeling approaches. Therefore, in this paper we present the first i* extension for capturing ML requirements and apply it to two real-world projects. Our study covers two main objectives for ML requirements: (i) allows domain experts to specify objectives and quality aspects to be met by the ML solution, and (ii) facilitates the selection and justification of the most adequate ML approaches. Our case studies show that our work enables better ML algorithm selection, preprocessing implementation tailored to each algorithm, and aids in identifying missing data. In addition, they also demonstrate the flexibility of our study to adapt to different domains.This work has been co-funded by the AETHER-UA project (PID2020-112540RB-C43), a smart data holistic approach for context-aware data analytics: smarter machine learning for business modeling and analytics, funded by the Spanish Ministry of Science and Innovation. And the BALLADEER (PROMETEO/2021/088) project, a Big Data analytical platform for the diagnosis and treatment of Attention Deficit Hyperactivity Disorder (ADHD) featuring extended reality, funded by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital (Generalitat Valenciana). A. Reina-Reina (I-PI 13/20) hold Industrial PhD Grants co-funded by the University of Alicante and the Lucentia Lab Spin-off Company

    A Methodology based on Rebalancing Techniques to measure and improve Fairness in Artificial Intelligence algorithms

    Get PDF
    Artificial Intelligence (AI) has become one of the key drivers for the next decade. As important decisions are increasingly supported or directly made by AI systems, concerns regarding the rationale and fairness in their outputs are becoming more and more prominent nowadays. Following the recent interest in fairer predictions, several metrics for measuring fairness have been proposed, leading to different objectives which may need to be addressed in different fashion. In this paper, we propose (i) a methodology for analyzing and improving fairness in AI predictions by selecting sensitive attributes that should be protected; (ii) We analyze how the most common rebalance approaches affect the fairness of AI predictions and how they compare to the alternatives of removing or creating separate classifiers for each group within a protected attribute. Finally, (iii) our methodology generates a set of tables that can be easily computed for choosing the best alternative in each particular case. The main advantage of our methodology is that it allows AI practitioners to measure and improve fairness in AI algorithms in a systematic way. In order to check our proposal, we have properly applied it to the COMPAS dataset, which has been widely demonstrated to be biased by several previous studies.This work has been co-funded by the AETHER-UA project (PID2020-112540RB-C43), funded by Spanish Ministry of Science and Innovation and the BALLADEER (PROMETEO/2021/088) projects, funded by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital (Generalitat Valenciana)
    • …
    corecore